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Abstract 

We consider importance sampling as well as other properly weighted samples with 
respect to a target distribution tt from a different point of view. By considering the 
associated weights as sojourn times until the next jump, we define appropriate jump 
processes. When the original sample sequence forms an ergodic Markov chain, the as- 
sociated jump process is an ergodic semi-Markov process with stationary distribution 
tt. Hence, the type of convergence of properly weighted samples may be stronger than 
that of weighted means. In particular, when the samples are independent and the 
mean weight is bounded above, we describe a slight modification in order to achieve 
exact (weighted) samples from the target distribution. 
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1 Introduction 

Importance sampling (IS) (Marshall, 1956) is a well-known Monte Carlo method that is 
useful in any discipline where integral approximations are needed. For a measurable space 
(X, B{X)) and a probability distribution tt defined on it, the method attempts to estimate 
the integral 

E 7r (/i) := / h(x)n(dx), 
Jx 

for h € J C 1 (7r), by drawing independent samples x\, . . . , x n from a trial distribution g with 

support at least that of tt. Assuming that the distributions have densities 7r(x), g{x) with 
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respect to a a-finite measure fx [i.e., ir(dx) = w(x)fj,(dx), g(dx) = g(x)(i(dx)], E n (h) is 
approximated by 

uis ._ E"=i w(xi)h(xj) m 
n n .- n , (ij 

or by 

lis ._ EjU w{xi)h{xi) 

where w(xj) := ir(xi)/g(xi). In standard terminology, g is called "the importance distri- 
bution" and io(xj)'s "the importance weights". 

The validity of /t n and /i n as approximations of E„-(/i) is justified by the strong law 
of large numbers which ensures that for all h € £ 1 (vr) it holds rT x Yl w{Xi)h(Xi) — > 
E 9 {w(X)/i(X)} = E^(/i) and n" 1 £>pf;) ^ E 9 {u>(A)} = 1 with probability one. Note 
that h n can be used in more general settings than h n , e.g. when the importance weights 
w(x) are known up to a multiplicative constant. In this case the second limit in general 
differs from one whilst the first becomes E g {w(X)}E 7r (h). Also note that the assumption 
of independent samples can be relaxed since there are more general contexts under which 
the above IS estimators converge to E 7r (/i). For example, if the sequence of A"s forms a 
Harris ergodic Markov chain with stationary distribution g, the above limits still hold due 
to the Ergodic Theorem. 

Besides integral approximations, simulation methods aim to generate samples from a 
target distribution. In this respect, IS seems at first glance to fail obtaining samples from it 
since all draws are made from the importance distribution g. The Sampling/Importance 
Resampling method (SIR, Rubin, 1987) is an attempt to circumvent this drawback by 
weighted resampling with replacement from the generated g-sample. The weight assigned 
to Xi is its normalised importance weight w(xi)/ Y17=i w ( x i)- As n — > oo, this approach 
produces a sample which is approximately 7r-distributed. Smith and Gelfand (1992) re- 
visited the approach and proved the assertion by considering the resampling procedure as 
weighted bootstrapping. 

The aim of the present paper is to show that, under certain conditions, when the g- 
sample is properly weighted it converges in a sense to the target distribution tt. Actually, 
this is true for a jump process associated with the weighted sample. From this point of 
view, importance weighting does not differ much from MCMC sampling schemes. In fact, 
some of them are special cases of the above mentioned jump processes (e.g. the Metropolis- 
Hastings algorithm, see Subsection 3.1). It turns out that in order to obtain approximate 
samples from ir resampling is not needed at all. However, our intention is to present these 
facts without to criticise SIR. 

The paper is organised as follows: In Section 2 we define the jump process associated 
with a weighted sample and prove that under certain conditions it converges weakly to 
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the target distribution. Section 3 contains some examples of known sampling schemes 
which are special cases of this context. As a result of independent interest, we give an 
upper bound for the total variation distance to stationarity of Markov jump processes 
with embedded Doeblin chains when the mean sojourn time is bounded above. In Section 
4 we discuss the case of stationary weighted sequences and show that in some cases it is 
possible to locate the time after which the associated jump process reaches equilibrium. 
Finally, an Appendix contains proofs of the stated propositions. 

2 Jump processes associated with properly weighted se- 
quences 

The concept of a properly weighted sample has been introduced by Liu and Chen (1998) 
(see also Liu, 2001) as a generalisation of the standard IS method. 

Definition 2.1. [Liu and Chen (1998)] A set of weighted random samples (Xj,£j)i^^ n is 
called proper with respect to ir if for any square integrable function h, 

B{Zih(Xi)} = KE % {h{Xi)}, for i = 1, . . . , n, 

for some positive constant k. 

As Liu (2001) points out, an equivalent definition is the following: 

Definition 12.11( a) A set of weighted random samples (Aj, £i)i^i^ n is called proper with 
respect to tt if 

E{£,i\Xi = x} = Kn(x)/g(x), for i = 1, . . . ,n, 

for some positive constant k, where Xi ~ g. 

In the sequel, we will associate with any infinite weighted random sequence a jump 
process in the following sense. 

Definition 2.2. Consider a weighted sequence (A n ,£ n ) ne z + := ((Ao, £o)> (Ai, £i), . . .), 
where the £ 's are strictly positive weights. Define So = 0, S n = Y17=o n ^ 1> an< ^ ^ 

N t := sup{n : S n ^t}, t^ 0. 

Then, the stochastic process Y = (lt)^o defined by Yt := X^m, t ^ 0, will be called the 
jump process associated with the weighted sequence (A n ,£ n ) ne z + - 

The definition ensures that the process Y has right continuous sample paths which 
also have left hand limits. However, if the support of £ n 's is a subset of N = {1, 2, . . .}, 
we will consider the process Y only for t £ Z+, i.e. we set Y = (Yq,Yi,Y2, ■ ■ ■)■ If this is 
the case, limits of quantities related to Yt should be suitably interpreted. 
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Proposition 2.1. Assume that the sequence X = {X n ) n& z + is a homogeneous Harris 
ergodic Markov chain with state space (X,B(X)) having an invariant probability distribu- 
tion g and the distribution of £ n depends solely on X n with E{£ n |X n = x} = kw{x) = 
nir(x)/g(x) for some k > 0. Then, for the jump process (Yj)t^o associated with the weighted 
sequence (X n ,^ n ) n£ z + it holds that 

limP{Yt G A} = n(A), V A G B{X). 

Proof. The result follows from the standard theory of semi-Markov processes (cf. Limnios 
and Opri§an, 2001). Under the above assumptions, Y is an ergodic semi-Markov process 
with embedded Markov chain X and respective sojourn times (£n)nez + - Thus, 

/ A E{£|x}s(a;Md2;) / A Kw(x)g(x)^(da;) vr(x)/i(dx) 

lim Pjit G A} = „ — — - — — — = -i — — — = -7 — — — — = ir(A) 

t?oo J x E{£ |xj(7(x)/u(dx) Kw(x)g(x)iJ,{dx) J x ir{x)[i(dx) 

as is claimed. □ 
Setting deterministically £ n = w(X n ), we have the following: 

Corollary 2.1. If {X n ) n& z + forms a Harris ergodic Markov chain with stationary distri- 
bution g, then the jump process associated with the weighted sequence (X n ,w(X n )) n£ z + 
has it as limit distribution. 

Any sequence of independent ^-distributed random variables forms trivially an ergodic 
Markov chain with stationary distribution g. Thus, Corollary 12. II covers also the original 
importance weighted sequence. 

Example 2.1. Let the target distribution be the normal mixture tt ~ g A/"(0,3 2 ) + 
g JV(5, 1) + § JV(15, 2 2 ) and g ~ C(0, 10) (i.e. centered Cauchy with scale 10). We have 
run the standard IS algorithm m = 10000 times independently and for each run we have 
recorded the values Y\, I3 and Y10 of the associated jump process. The corresponding 
histograms in Figure ^ clearly illustrate the distributional convergence to it. 

Under the assumptions of Proposition 12. 1| the weighted average 

t ._ Ym=Q jiK X i) ,o\ 
Z^i=0 Si 

converges almost surely to E 7r (/i) for any h G £ 1 (7r). Furthermore, if the Central Limit 
Theorem holds for h n , then its asymptotic variance is 

a\h) = a 2 IS (h) + \ E g [Vav{C\X}{h(X) - E^h)} 2 ] , 
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Figure 1: Histograms of the values of Yi, Y3 and Y\o of m = 10000 independent IS runs 
with target distribution vr ~ | Af(0, 9) + | AA(5, 1) + ± A/"(15, 4) and 5 ~ C(0, 10). 



where &jg(ti) is the asymptotic variance of the IS estimator hl~ in @ (which arises using 
£ n = Kw(X n ) deterministically) . This follows immediately from the identity 



K^giwiXtfhiXMiXj)} + EgiVai^XiMXiWXi)}, i = j, 
K^giwiX^hiX^wiX^fiXj)}, i + j, 



and the formula for the asymptotic variance of a ratio estimator. As is intuitively rational, 
randomisation of the weights increases the variance. Hence, for estimation purposes the IS 
estimator is always more accurate. Otherwise, the less variable the weights are the more 
accuracy is achieved. 

In light of Definition 12.21 the estimator © can be expressed as 
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hn = ^ h(Y s )u(ds), 

£>n JO 

where v is either the Lebesgue or the counting measure depending on whether the weights 
are continuous or discrete. Note that in the case of standard IS estimator (i.e. when 
£ n = Kw(X n )), this gives another justification for the use of in © instead of h^f in 

©• 

Proposition 12.11 establishes a stronger result than that of the convergence of Cesaro 
averages 

ht := t- 1 ( h(Y s )u(ds) 
J 

to E 7r (ft.). It states that there is distributional convergence of the generated sequence to 
the target distribution analogous to that of MCMC schemes. Thus, properly weighted 
samples and in particular the output of the standard IS method are in general serious 
competitors of them. 

The requirement that the distribution of £ n depends only on x n seems rather restrictive. 
However, x n could be a block of specific size allowing £ n to depend on more than one term 
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of the sequence. (Note that the standard definition of a semi-Markov process allows the 
sojourn time of X n depending on both X n and X n+ ±.) This is illustrated via two examples 
in Subsection 3.4. 



3 Examples 

In this section we discuss some known simulation schemes which are special cases of the 
jump process context. More specifically, we refer to cases where the conditional distri- 
bution of the weights is geometric (discrete case) or exponential (continuous case) and 
thus the associate jump process is a pure Markov jump process. Moreover, we consider 
two IS estimators used in diagnosing convergence of MCMC schemes under the current 
perspective. 

3.1 The Metropolis-Hastings algorithm 

Consider an arbitrary Metropolis-Hastings (MH, Metropolis et al., 1953; Hastings, 1970) 
algorithm with target distribution ir and proposal q(-\-), that is, at time t+1 given Y t = y, 
draw Z ~ q(z\y) and set Y t+ \ = z with probability 

2 )= mm (l, 444^1 (4) 



n(y)q(z\y) 

or Yf+i = y with probability 1 — a(y,z). Although it is well-known that the algorithm 
defines a reversible Markov chain with stationary distribution ir, let us consider it from a 
different point of view. 

Let X = (X n ) n< =z + be a Markov chain with transition density 

a{xi-i,Xi)q{x i \xi- 1 ) min{7r(xj_i)g(xi|xj_i), 7r(xi)g(xj_i |xj)} 



g(xi\x 



J a{xi-i,z)q{z\xi-i)n{dz) f mm{7t(xi-i)q(z\xi-i), n(z)q(xi-i \z)}fi(dz) ' 

(Notice that this is exactly the density of the accepted states of the above MH algorithm.) 
It can be easily verified that g(xi\xi-i) satisfies the detailed balance condition 

g(xi-i)g(xi\xi-i) = g(xi)g(xi-i\xi), 

where g(x) cx f min{ir(x)q(z\x), n(z)q(x\z)}n(dz). This function, when normalised, re- 
sults to a probability density function since 

fg(x)n(dx) f fir(x)q(z\x)n(dz)n(dx) = 1, 

hence it is the stationary distribution of the Markov chain X. Weight now Xi by £j drawn 
from the geometric distribution with probability mass function 

P{£\ x i) = {f a(xi, z)q(z\xi)fi(dz)} {l - / a(xj, z)q(z\x i )fi(dz)Y~ 1 , ^ = 1,2, . . . . 
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Since 

E{£\xi} = {f a(xi,z)q(z\xi)fi(dz)} 1 oc ir(xi)/g(xi), (5) 

the sequence (X n , £ n ) n& z + is properly weighted with respect to n. It is immediately seen 
that the associated jump process is the original MH output {Yt)t&z+ which is known to 
be a pure Markov chain (rather than a general discrete time semi-Markov process). 

The above analysis suggests that we are allowed to use any distribution (beyond the 
geometric) for the weights provided © is satisfied. However, direct calculation of the 
importance weight is in general computationally costly or even infeasible making such a 
task hard to accomplish. Moreover, the geometric distribution comes out naturally, since 
each simulation from g{\) automatically generates the corresponding geometric weight. 

An important issue is the rate of convergence of the resulting chain to the target 
distribution. In the case of independence MH, i.e. when q(y\z) = q(y), Mengersen and 
Tweedie (1996) showed that if w* = sup x 7r(x)/q(x) < oc, the algorithm is uniformly 
ergodic with ||P{lf G •} — vr|| < (1 — 1/uj*)*, t G Z+, where \\fi\\ = sup A&B ^ \n(A)\ 
denotes as usual the total variation of the signed measure [i. In general, this is the case if 
sup x z ir(x)/q(x\z) < oo. 

3.2 Independence samplers with geometric weights 

With the term independence sampler we refer to a simulation scheme where the generated 
(unweighted) random variates are mutually independent. For instance, this excludes the 
independence MH algorithm. 

Sahu and Zhigljavsky (2003) and Gasemyr (2002) proposed independence samplers 
with geometric weighting distributions resulting in ordinary Markov chains. The in- 
dependence sampler of Sahu and Zhigljavsky (2003) can be described as follows. Let 
Z = (Zrt)neN be a sequence consisting of iid random variates from some distribution 
g. To each Z n it is associated a weight £ n drawn from the geometric distribution with 
probability mass function 

if kw(z) 1 m 
P(Cn = m \Z n = z) = — — — - < — — r-r^- > , m = 0,1,2,..., 

1 + KW(Z) { 1 + KW(Z) J 

where w(x) := ir(x)/g(x). When £ n = then the corresponding Z n is rejected. Let 
X = (X n ) n£ N be the sequence of the accepted Z n 's, i.e. those having been weighted by 
£ n > 0. Clearly, the sequence X also consists of iid draws but from the distribution 

g(x)P(£, > 0\x) _ ir(x)/[l + kw(x)} 
9{X) = f x g(z)P(£>0\zMdz) = J x Tr(z)/[l+Kw(z)UdzY 

Moreover, X n is weighted by £ n which is generated from the (truncated) geometric distri- 
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bution with probability mass function 

P(g ra = m\X n = x) = 1 { 1 , m = l,2,.... 

1 + Kwyx) y 1 + KW(X) J 

Since 

E{e„|X n = x} = 1 + k^z) oc u;(x) := ^4 = I / : ^4 ; M(d^) j {1 + kw(x)} 

9{x) yj x l + Kw{z) J 

the sequence (X n ,^ n ) n£ ^ is properly weighted with respect to tt and thus the associated 
(discrete time Markov) jump process Y = {Yt)t&z + converges to tt. Moreover, if w* = 
sup x Tr(x)/g(x) < oo, the total variation distance between P{Y t £ •} and tt is no greater 
than (1 + kw*)'*. 

Gasemyr (2002) generalised the above sampler by modifying the rejection rule. After 
Z n ~ g is drawn, Gasemyr accepts or rejects it according to the result of a Bernoulli 
trial with some probability of success q(z n ). Provided it is accepted, it is weighted by a 
geometric random variate with probability mass function 

P(U = m\Z n = z) = a(z){l - a(z)} m -\ m = 1, 2, . . . , 

where a (z) oc q(z)/w(z). Taking in particular a(z) = {1 + kw(z)}^ 1 and q(z) = kw(z){1 + 
kw(z)}^ 1 the sampler reduces to that of Sahu and Zhigljavsky (2003). Gasemyr (2002) 
shows that the choice q(z) = min{l, kw(z)} and a(z) = min{l, 1/kw(z)} is optimal in the 
sense that it minimizes the asymptotic variance of the estimators h n in (j2J) . 

As before, let X = (X n ) ne ^ be the sequence of the accepted Z's. For the above 
optimal choice of q, the X n 's are independent draws from the distribution 

g(x) min{l, kw(x)} tt(x) min{l, 1/kw(x)} 

J x g(z) min{l, Kw(z)}/J,(dz) f x ir(z)mm{l,l/nw(z)}(j,(dz) 

and 

^{in\X n = x} oc w(x) := = yj^ n(z) min j 1, I n(dz) \ max{l,K,w(x)}. 

Hence, the sequence (X n , cjn)neN is properly weighted with respect to ir and thus, the 
associated (discrete time Markov) jump process Y = {Yt)t€Z+ converges to tt. Once again, 
when w* < oo, this Markov chain is uniformly ergodic. When k > 1/w*, an upper bound 
of the total variation distance between P{lf G ■} and tt is (1 — l/KW*) t . The case k ^ 1/w* 
correspondes to rejection sampling and consequently Y t ~ 7r, V t G Z + . 

3.3 Exponential weights: Pure Markov jump processes 

Consider the case where conditional on X n = x, £ n follows an exponential distribution 
with mean kw(x). In this case, the associated jump process is a continuous time pure 
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Markov jump process. A particular case of such a sampling scheme is the birth and death 
MCMC algorithm (Stephens, 2000, Cappe et al., 2003). 

When w(x) is bounded above, the process exhibits a uniformly ergodic behavior. We 
state this assertion when X is a Doeblin chain in the following proposition the proof of 
which can be found in the Appendix. 

Proposition 3.1. Let (A" n ,£ n ) ng z + be a weighted sequence satisfying the following : 

1. The sequence X = (A n ) ne z + is an ergodic Markov chain with stationary distribution 
g and its transition distribution g{-\ ) satisfies a Doeblin condition, equivalently, there 
exists a probability distribution g$ and a nonnegative constant (3 such that 



2. The conditional distribution of £ n given X n = x is exponential with mean nw{x) = 
K7r(x)/g(x), where it is a probability distribution, and w* = sup x <z X w(x) < oo. 

Then, for the associated Markov jump process Y = (Ij)t^o it holds 



In case the X n 7 s are independent ^-distributed random variates, © holds with (3 = 1 
and go = g, thus the total variation distance in Q is no greater than exp{— t/nw*}. 
This bound is comparable to that of the previous cases, although the actual importance 
distribution is different (since here it has not been taken care for rejection control). 

Example 3.1. Consider again the distributions of Example 12. II for which it can be verified 
that w* sa 6.905. Let X = (X n ) ne z + be iid Cauchy C(0, 10) random variates and £ n be 
exponential with mean w(x n ). Then, by Proposition f° r the associated Markov jump 
process Y it holds ||P(^ G •) -7r|| < exp{-t/6.905}, t ^ 0. For instance, for t ^ 31.8, the 
total variation distance will be less than 0.01. 

3.4 Importance weighting an MCMC output 

Let 2/o>2/i> J/2) • • • be the output of any MCMC updating scheme having target distribution 
7r and updating distribution Tr{y n \y n -i). A crude method for diagnosing convergence of 
the chain to tt is checking the convergence of many estimators of some quantities of interest 
E 7r (/i) (cf. Robert and Casella, 1999, p.382). 

One particular estimator used in this context is the IS estimator 



g(z\y) > 09o(z), V z,y G X. 



(6) 



tt || s$ exp{-(3t/Kw*}, t ^ 0. 



(7) 
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provided that the ratios ^{Vi) /^{Vi\yi-i) can be calculated up to a constant. By setting 
£i := 7r(yj)/7r(yi \yi-i), it can be seen that the jump process associated with the weighted 
sequence (?/i,£i)j£N also converges to tt. Indeed, let Xj = (zj , x| ) := (yi-i,y%)- Then, 
the sequence {xn)n£N forms a Markov chain with transition density 

i \ \ x ( i ( 2 )t 

where <5 X (-) denotes the Dirac measure at x, and limit distribution g(x) = tt(x ( - 1 ^)tt(x ( - 2 ^ \x^) 
Setting 

= 7T(xf ) )7r(xf ) ) _ 7r(^) - = 12 

we are led to the sequence (xj,£j)j<=N which is properly weighted with respect to the 
product 7r(x( 1 ))7r(x( 2 )). Thus, the jump process associated with the (marginal) sequence 
(x\ ,^«)ieN = (yi,£,i)i£N has 7r as limit distribution. 

Robert and Casella (1999) discuss another approach providing convergent estimators in 
the context of MH algorithms. As in Subsection 3.1, consider a general MH algorithm with 
target distribution tt and proposal q(-\-). Denote now by yo,Ui, ■ ■ ■ , Um the whole simulated 
output (containing also the rejected samples) and by yo, y ai , . . . , y an the accepted variates. 
Define 73 := supjcr.,- : o~j < i} and notice that y n is the last accepted value before time i 
implying that yi has been generated from q(-\y n ). The IS estimator of E 7r (/i) used in this 
context is 

^qiViWn) I fr! q(Vi\VTi)' 

Setting £i = ir(yi)/q(yi\y Ti ), it can be seen that the jump process associated with the 
weighted sequence Ci)ieN converges to n. To see this, define x« = (xf~ , xf^) := (y n ,yi). 
If Ti < i — then 75 = Tj_i and thus x^ = ■ On the other hand, t% = % — 1 implies that 
x^ = x^. The latter occurs with probability a(z^_ 1 , x^_ x ) (given in (J3J), whereas the 
former with probability 1 — a(x[^} 1 ,x^ l ). Hence the sequence xi,X2, • . • forms a Markov 
chain with transition density 

3;! 



The stationary distribution of this chain is 7r(x^)q(x^ \x^) since ?/0) 2/td 2/t 2 5 • • • is the 
original MH output. Weighting each Xj by 

Si " 



7 r(xf ) )g(xf ) |x i (1) ) QiVilVn 



(2) 

we obtain the desired result for the marginal sequence (x\ , CiJieN = y/i>£i)ieN- 
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4 Stationary weighted samples 



Hereafter we denote by p{v\x)v{dv) the conditional distribution of £ given x, with v denot- 
ing either the Lebesgue measure (continuous weights) or the counting measure (discrete 
weights). Moreover, we set 

P(u\x) := / p(v\x)v(dv) = P(£ ^ u\x). 

J u.oc) 

In this section, we will discuss a slight modification of the standard weighting method 
under which one can obtain stationary weighted samples from the target distribution ir. 
We will need first some facts from the theory of semi-Markov processes. 

Let (Ytjt^o be a semi-Markov process with embedded Markov chain (A n ) nG z + and 
respective sojourn times (£ n )ngz + - Let also S n and N t be as in Definition 12.21 Then, 
(X n , 5 n ) ne z + is a Markov renewal process and (N t )t^o is its counting process. The corre- 
sponding excess life (residual age) process is defined by (Vt)t^o by Vt := S jVt+i — t- Then, 
(Yt,Vt)t^o is a Markov process with stationary distribution ir(y)p e (v\y), where 

»(>M == ^ 

(cf. McDonald, 1977). 

In the context of weighted samples this suggests the following. Assume that it is 
possible to generate Xq ~ n. Then, if Xq is weighted by £o drawn from p e (-\xo) (instead 
of p(-\xq)), the excess life process (Y tl Vt)t^o starts in equilibrium and thus Y t ~ ir, Vt ^ 0. 
As a consequence, the estimator 



h 



t- 1 j\{Y s )v{&s) = r 1 {Eflo^MXj) + (t- S Nt )h(X Nt )] 



is exactly unbiased for E 7r (/i) for any fixed time t > 0. 

In the case of IS we have that p e (v\y) ~ U(0, w(y)). Hence, if Xq ~ n, U ~ W(0, 1) and 
Xi , X2 , . . • are iid draws from g (or an ergodic Markov chain with stationary distribution 
g), the estimator h t becomes 



ht 



t- 1 [Uw(X )h(X ) + Zfli 1 MXjMXj) + (t - StvXXvJ} . 



In general, simulation from p e (f |y) can be completed in two stages. Generate first 
a random variate T ~ Q(t\y) oc tp(t|y) and then draw V uniformly distributed in (0, T) 
(continuous case) or in {1,...,T} (discrete case). This is a quite easy task for many 
standard distributions. For example, if p is a gamma distribution with shape a, then q is 
also a gamma distribution but with shape a + 1. Or, if p is Poisson then q is a truncated 
Poisson with the same parameter. 
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When (X n ) nG z + is an iid sequence from g and the hazard rate p(v\y) / P (v\y) is uni- 
formly bounded away from zero, i.e. if 

£ ,:= i nf^l>0, (8) 
v >v P[v\y) 

an accept-reject argument can give at once a stationary semi-Markov process. Indeed, let 
be a draw from g(x)p(£\x). Then, 

and consequently if f7 is an independent U(0, 1) random variate and U ^ £*P(£|x)/p(£|x) 
holds, (x, £) can be considered as 7r(x)p e (£|x)-distributed. 

Lemma 4.1. Tjf ZioZds i/ien (%) w* = sup a . g ^ w(x) < oo and (ii) there exists a > 1 suc/i 
i/iai B g {a^} := J J a v g(y)p(v\y)u(dv)/j,(dy) < oo. 

According to the part (i) of this lemma, the above approach does not offer much, since 
the accept-reject method applies also to the x n 's. However, part (ii) can be used to obtain 
a crude bound for total variation distance to stationarity for the associated semi-Markov 
process. 

Theorem 4.1. Assume that (A n ) ne z + is an iid sequence from g and J5J) holds. Then for 
the associated jump process Y = (It)t^o we have the following: 

(a) There is an almost surely finite time r 0, such that Y-t ~ vr for t ^ r. 

(b) The total variation distance between the law of Yt and ir converges to zero exponen- 
tially fast in t. 

Appendix 

In order to prove Proposition 13. II we will need the following lemma. 
Lemma A. 2. Under the assumptions of Proposition^^ it holds 

lim(l-/3(l-e-* / '™*) / e- t/Kw{z) g {z)^{dz)\ ' = exp{-(3/ kw*}. 
t i° [ Jx J 

Proof. Write first 

a(t) := l-/3(l- e -*/™*) / e - i /™( z ) ff0 (z)/i(dz) 

Jx 

= 1- p(l-e' t/KW *) + p(l-e~ t/KW *) [ (l-e- t/Kwiz) )go(z)fi{dz) 

Jx 



(9) 
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Observe that © implies g(z) ^ (3g (z), Vz. Hence, 

P [ (l-e-^VWM^K / {l-e- t / KW ^)g(zMdz)=-P 9 (^^t) = o(l). 



X JX 



Since 1 - er*/™* = O(t), we have that /?(l - e -*/™»") (l - e-^Wjg^zMdz) = o(t) 
and thus, 

1 - - e^™*) < o(t) < 1 - 0(1 - e"*/^*) + o(t), 
Raising to the power of 1/t and taking the limits we obtain the desired result. □ 

Proof of Proposition lff.il Since (Y t )t^o is a Markov process, every <5-skeleton, i.e. any 
sequence (Y n s) n ^z + for fixed 5 > 0, forms a Markov chain with transition kernel P <5 (y, A) = 
P{Y 5 G A|Y = y}- But then, 



P*(2/,A) = P{e ><5,yGA|X = y}+^P{5 m ^<5< < S m+1 ,X m GA|X = y} 

m=l 

> P{£o<$,£i>$,*i€ji|X = l/} 

= {l_ e -*/«»(v)} /" e- 5 l KW ^g{z\y)n(dz) 
J A 

> {l_ e -«/™*} /" e -^W / 5 5o (z) / u(dz) 



eiQtf(A), Vy, 



where 



and 



e 5 := (3{l-e- 5/KW *} [ e~^ KW ^ g (z)fi{dz) 
Jx 



Qs(A) :-- 



Thus, the <5-skeleton chain satisfies a Doeblin condition. It is well known that this implies 

\\P(Y nS G •) - tt|| ^(l-e 5 ) n , Vn€Z+, 
hence, writing t = n8, we conclude that 

||P(Y t e-)-7r|K{(i-^/n) n/ T, Vnez + . 

However, lim(l — £t/n) n ^ = hm(l — et) 1 ^ and the result follows from Lemma A.l. 

n|oo tj.0 

Proof of Lemma \4-l\ (i) We have that 

p(fly) 

g7 , \ > £*, Vv,y p(v\y) > £„P{v\y), Vw,j/ 
P(v\y) 
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p(v\y)v(dv) > / P(v\y)v(dv), Vy 44> 1 > e*Kw(y), My 
< l/«e*, Vy, 

thus w* = sup y w(y) ^ < oo. 

(ii) We will first prove that E{£ m |y} ?J m!/e™, m ^ 1, by induction. By (i) the assertion 
holds for m = 1. Assuming that it holds for some m we have 



E{r|y} = J v m p(v\y)u(di 



^ e, I v m P(v\y)u(dv) = e* \ v m [ / p(u\y)u(du) u(dv) 

<[v,co) J 



e* / [ / v m u(dv) p{u\y)u{du) > £* / / v m dw ) p(«|y)i/(dn) 

'(0,«] / J \Jv=0 



/ m+l 
-—-p(u\y)u(du) = ^-E{r +1 |y}, Vy, 
m+l m+l 

and thus, E{£ m+1 |y} sC (m + l)!/e™ +1 . Now, 

^— ' m! ^— ' ml e™ 1 — loga/e* 

m=0 m=0 ' 

for any a € [1, expje*}), the same holding true also for E 9 {a^} = E g [E{a^|Y}]. 

Proof of Proposition After (X n ,£ n ) has been drawn, generate an independent random 
variate U n ~ U(0, 1). Let N be the first index n S Z + at which U n ^ £*-P(Cn|^n)/p(Cn|2 ; n) 
occurs. Then (Y Sn ,Vs n ) = (X N ,£ N ) ~ 7r(-)p e (-|-). Since Trp e is stationary for (Y t ,V t )t^o, 
we have that It ~ tt for every t ^ t := Sjy. Now, by the standard theory of rejection 
sampling, iV + 1 ~ G(ke*). Moreover, the "rejected" (X, £)'s have the residual distribution 

. g(x)p(v\x) - ne*7r(x)p e (v\x) p(v\x) — e*P(v\x) 
r(x,v) := = g[x) . 

1 KE* 1 KE# 

Notice that for any a such that <Pg(a) := E 9 {a^} < oo it also holds l p r {o J ) := E r {a^} < oo 
since r(x,v) ^ g(x)p(v\x)/(l — ke*) for all x,v. The function <^ r (a) is continuous and 
increasing in a ^ 1 with (/? r (l) = 1> hence there exists p > 1 such that (p r (p)(l — ke*) < 1. 
Then, 

P(r > t) = P(S N > t) < p-*E{/-} = p-<E{^( P n = ^ P '\ (10) 

which converges to zero as t — > oo implying P(t < oo) = 1. 



14 



(b) Let A G B{X). Then, 

\P{Y t G A] -tt(A)\ = \P{Y t G A\t > t}P{r > t} + P{Y t G A\r < i}P{r < t} -tt(A)| 

< P{r > |P{y 4 G A\t > t} -tt(A)| 

< P{r>t}, 

since P{T t G A\r < t} = vr(yl) by (a), and |P{Y t G A|r > t} - tt(A)| < 1. Thus, 
\\P{Y t G ■} -tt|| = sup |P{y t €A}-7r(A)|<P{r>t} = 0(p _ *) 

AeB(X) 

by Pj). 
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